Contrasting Machine Learning Approaches for Microtext Classification

نویسنده

  • Jeffrey Ellen
چکیده

The goal is classification of microtext: classifying lines of military chat, or posts, which contain items of interest. This paper evaluates non-linear statistical data modeling techniques, and compares with our previous results using several text categorization and feature selection methodologies. The chat posts are examples of 'microtext', or text that is generally very short in length, semi-structured, and characterized by unstructured or informal grammar and language. These three distinct attributes cause different results than traditional long-form free text. In this paper, we further characterize microtext. Highly accurate classification of microtext entries is crucial to facilitate more complex information extraction. Although this study focused specifically on tactical updates via chat, we believe the findings are applicable to content of a similar linguistic structure regardless of domain. This includes other microtext sources such as IM/XMPP, SMS, voice transcriptions, and micro-blogging such as Twitter(tm).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Ontologies from the Web for Microtext Processing

We build a mechanism to form an ontology of entities which improves a relevance of matching and searching microtext. Ontology construction starts from the seed entities and mines the web for new entities associated with them. To form these new entities, machine learning of syntactic parse trees (syntactic generalization) is applied to form commonalities between various search results for existi...

متن کامل

A Microtext Corpus for Persuasion Detection in Dialog

Automatic detection of persuasion is essential for machine interaction on the social web. To facilitate automated persuasion detection, we present a novel microtext corpus derived from hostage negotiation transcripts as well as a detailed manual (codebook) for persuasion annotation. Our corpus, called the NPS Persuasion Corpus, consists of 37 transcripts from four sets of hostage negotiation tr...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Prostate cancer radiomics: A study on IMRT response prediction based on MR image features and machine learning approaches

Introduction: To develop different radiomic models based on radiomic features and machine learning methods to predict early intensity modulated radiation therapy (IMRT) response.   Materials and Methods: Thirty prostate patients were included. All patients underwent pre ad post-IMRT T2 weighted and apparent diffusing coefficient (ADC) magnetic resonance imagi...

متن کامل

Fault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods

Anti-Friction Bearing (AFB) is a very important machine component and its unscheduled failure leads to cause of malfunction in wide range of rotating machinery which results in unexpected downtime and economic loss. In this paper, ensemble machine learning techniques are demonstrated for the detection of different AFB faults. Initially, statistical features were extracted from temporal vibratio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011